feat: LangChain-enhanced task completion detection for keepalive#459
feat: LangChain-enhanced task completion detection for keepalive#459
Conversation
- Add llm_provider.py with GitHub Models → OpenAI → regex fallback chain - Add codex_jsonl_parser.py for parsing Codex --json event streams - Add codex_session_analyzer.py for task completion detection - Add langchain optional dependency to pyproject.toml - Add comprehensive tests for all new modules (38 tests) - Add integration plan document with data source options This implements Phase 0 of the LangChain keepalive integration: - Option A: Summary only (current --output-last-message) - Option B: Full JSONL stream (--json mode) - recommended - Option B filtered: High-value events only (agent_message, reasoning, todo_list) The JSONL parser handles Codex event schema variations including: - Old (assistant_message) and new (agent_message) field names - Streaming item updates - Todo list items for direct task mapping Refs #453
- Add SESSION_JSONL variable for PR-specific session file naming - Change Codex execution to use --json flag, redirecting JSONL stream to file - Add 'Analyze Codex session' step that parses session data with codex_jsonl_parser - Output session metrics (events, messages, commands, file changes, todos) - Include codex-session*.jsonl in artifact uploads Part of #454: LangChain-enhanced task completion detection
- Add scripts/analyze_codex_session.py CLI for session analysis - Extract tasks from PR body checkboxes - Run LLM analysis via GitHub Models API (with OpenAI/regex fallback) - Support JSON, markdown, and github-actions output formats - Update PR body checkboxes based on completion detection - Enhance workflow with dedicated LLM analysis step - New 'Analyze task completion with LLM' step after session parsing - Fetches PR body via gh CLI to extract tasks - Outputs completion results to GITHUB_OUTPUT - Add 17 tests for CLI (100% pass) - Task extraction from PR body - Checkbox update logic - CLI integration tests Part of #454: LangChain-enhanced task completion detection
…ation
- Add llmCompletedTasks parameter to autoReconcileTasks()
- LLM tasks take priority over commit-based analysis
- Commit analysis adds supplementary matches not covered by LLM
- Deduplicates matches by task text (case-insensitive)
- Add LLM analysis outputs to reusable-codex-run.yml
- llm-analysis-run: whether analysis was performed
- llm-completed-tasks: JSON array of completed tasks
- llm-has-completions: boolean for quick check
- session-event-count, session-todo-count for metrics
- Save analysis JSON file for debugging (codex-analysis-{PR}.json)
- Uploaded as artifact alongside session JSONL
- Update keepalive workflows to pass LLM tasks
- agents-keepalive-loop.yml
- templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml
All 63 JS tests pass, all 55 Python tests pass.
Part of #454: LangChain-enhanced task completion detection
- Add llm_provider, llm_confidence, llm_analysis_run inputs to updateKeepaliveLoopSummary - Display 🧠 Task Analysis section showing which provider was used - Show warning when fallback provider (OpenAI or regex) was used - Add llm-provider and llm-confidence outputs to reusable-codex-run.yml - Update agents-keepalive-loop.yml to pass LLM info to summary - Update consumer template with same changes - Add 3 tests for LLM provider display scenarios This gives users visibility into whether the primary GitHub Models provider was used or if the system fell back to OpenAI or regex.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Status | ✅ no new diagnostics |
Automated Status SummaryHead SHA: 7de54ca
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScope
Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #459 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR adds LangChain-based LLM analysis for intelligent task completion detection in the keepalive automation loop. The implementation introduces a provider fallback chain (GitHub Models → OpenAI → Regex), JSONL session parsing from Codex --json output, and integrates task analysis results into PR updates and summary comments.
Key changes:
- New Python modules for LLM provider abstraction, JSONL parsing, and session analysis
- Workflow modifications to capture
--jsonoutput and run LLM analysis - JavaScript updates to display LLM provider information and merge LLM-detected tasks with commit-based detection
- 20 new tests covering Python CLI, analysis, and JavaScript display logic
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
tools/llm_provider.py |
LLM provider abstraction with GitHub Models/OpenAI/Regex fallback chain |
tools/codex_jsonl_parser.py |
Parser for Codex JSONL event stream from --json output |
tools/codex_session_analyzer.py |
Orchestrates session analysis using parsed JSONL and LLM providers |
scripts/analyze_codex_session.py |
CLI entry point for analyzing sessions from GitHub Actions |
tests/tools/test_llm_provider.py |
Unit tests for provider availability and fallback behavior |
tests/tools/test_codex_jsonl_parser.py |
Tests for JSONL parsing including schema variations |
tests/scripts/test_analyze_codex_session.py |
CLI integration tests with subprocess calls |
.github/workflows/reusable-codex-run.yml |
Captures --json output and runs analysis steps |
templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml |
Template workflow integrating LLM task detection |
.github/workflows/agents-keepalive-loop.yml |
Passes LLM metadata to summary comment generation |
.github/scripts/keepalive_loop.js |
Displays LLM provider info and merges LLM/commit task sources |
.github/scripts/__tests__/keepalive-loop.test.js |
Tests for LLM provider display in PR summaries |
pyproject.toml |
Adds optional langchain dependencies |
docs/plans/langchain-keepalive-integration.md |
Planning document describing architecture and options |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| summaryLines.push( | ||
| '', | ||
| '### 🧠 Task Analysis', | ||
| `| Provider | ${providerIcon} ${providerLabel} |`, | ||
| `| Confidence | ${confidencePercent}% |`, |
There was a problem hiding this comment.
The markdown table formatting is incomplete. Lines 1231-1232 create table rows without proper markdown table syntax (missing header separator and consistent column structure). The output will render as plain text rather than a table. Add proper table headers and separators, for example:
| Field | Value |
|-------|-------|
| Provider | ... |
| Confidence | ... |
|
|
||
| # GitHub Models API endpoint (OpenAI-compatible) | ||
| GITHUB_MODELS_BASE_URL = "https://models.inference.ai.azure.com" | ||
| DEFAULT_MODEL = "gpt-4o-mini" |
There was a problem hiding this comment.
The PR description states the primary provider uses "gpt-4.1-mini", but the code actually uses "gpt-4o-mini" (line 28). This is a discrepancy between documentation and implementation. "gpt-4.1-mini" doesn't appear to be a valid OpenAI model name. Update the PR description to reflect the actual model being used.
| python3 << 'PYEOF' | ||
| import os | ||
| import sys | ||
| sys.path.insert(0, '.') |
There was a problem hiding this comment.
The inline Python script sets sys.path.insert(0, '.') (line 465) but PYTHONPATH is already set to github.workspace (line 438). The relative path '.' may not resolve correctly depending on the working directory at execution time. For consistency and reliability, use the PYTHONPATH that's already configured or use an absolute path based on github.workspace.
| sys.path.insert(0, '.') |
| with patch("tools.llm_provider.get_llm_provider") as mock_provider: | ||
| from tools.llm_provider import RegexFallbackProvider | ||
|
|
||
| mock_provider.return_value = RegexFallbackProvider() | ||
|
|
||
| result = subprocess.run( | ||
| [ | ||
| sys.executable, | ||
| "scripts/analyze_codex_session.py", | ||
| "--session-file", | ||
| str(sample_session_file), | ||
| "--pr-body-file", | ||
| str(sample_pr_body_file), | ||
| "--output", | ||
| "json", | ||
| "--update-pr-body", | ||
| "--updated-body-file", | ||
| str(updated_file), | ||
| ], | ||
| capture_output=True, | ||
| text=True, | ||
| cwd=Path(__file__).parent.parent.parent, | ||
| ) | ||
|
|
||
| assert result.returncode == 0 |
There was a problem hiding this comment.
The mock provider is set up using a context manager (lines 256-259), but the subprocess.run call (lines 261-278) spawns a separate Python process that won't inherit this mock. The patch only affects the current test process, not the subprocess. This test will actually use the real provider chain, not the mocked RegexFallbackProvider. To properly test this, either mock at the subprocess level (via environment manipulation) or refactor the CLI to be testable without subprocess calls.
| # Reuse the same prompt building logic | ||
| github_provider = GitHubModelsProvider() | ||
| prompt = github_provider._build_analysis_prompt(session_output, tasks, context) | ||
|
|
||
| try: | ||
| response = client.invoke(prompt) | ||
| result = github_provider._parse_response(response.content, tasks) |
There was a problem hiding this comment.
The OpenAI provider instantiates a GitHubModelsProvider just to reuse its private methods. This creates an unnecessary object and tightly couples the two providers. Consider extracting the prompt building and response parsing into shared helper functions or methods on a base class.
| for task in completed_tasks: | ||
| # Escape special regex characters in task | ||
| escaped_task = re.escape(task) | ||
|
|
||
| # Pattern to match unchecked checkbox with this task | ||
| pattern = re.compile( | ||
| rf"^([\s]*-\s*)\[ \](\s*){escaped_task}", | ||
| re.MULTILINE, | ||
| ) | ||
|
|
||
| # Replace with checked version | ||
| updated_body = pattern.sub(rf"\1[x]\2{task}", updated_body) |
There was a problem hiding this comment.
The regex pattern on line 109 requires an exact match of the task text after escaping, but the escaped task is inserted unescaped back into the replacement on line 114. If the task contains characters that were escaped (e.g., parentheses, brackets), the replacement will use the original unescaped version, which could cause the pattern to not match correctly on subsequent calls or create inconsistencies. Use escaped_task in the replacement as well, or use the original task variable in both places consistently.
| for task in tasks: | ||
| task_lower = task.lower() | ||
| # Simple keyword matching | ||
| task_words = set(task_lower.split()) | ||
|
|
||
| # Check for completion signals | ||
| is_completed = any( | ||
| word in output_lower | ||
| and any( | ||
| p in output_lower | ||
| for p in ["completed", "finished", "done", "fixed", "✓", "[x]"] | ||
| ) | ||
| for word in task_words | ||
| if len(word) > 3 | ||
| ) | ||
|
|
||
| # Check for progress signals | ||
| is_in_progress = any( | ||
| word in output_lower | ||
| and any( | ||
| p in output_lower | ||
| for p in ["working on", "started", "implementing", "in progress"] | ||
| ) | ||
| for word in task_words | ||
| if len(word) > 3 | ||
| ) | ||
|
|
||
| # Check for blocker signals | ||
| is_blocked = any( | ||
| word in output_lower | ||
| and any( | ||
| p in output_lower for p in ["blocked", "stuck", "failed", "error", "cannot"] | ||
| ) | ||
| for word in task_words | ||
| if len(word) > 3 | ||
| ) | ||
|
|
||
| if is_completed: | ||
| completed.append(task) | ||
| elif is_blocked: | ||
| blocked.append(task) | ||
| elif is_in_progress: | ||
| in_progress.append(task) |
There was a problem hiding this comment.
The regex fallback matching logic has a high likelihood of false positives. The current logic checks if any task word (longer than 3 characters) appears anywhere in the output along with a completion keyword anywhere else in the output. For example, if the output contains "completed refactoring" and a task is "Update tests", both "update" and "tests" are unrelated to "completed refactoring", but if either word appears anywhere in the output, the task would be marked as completed. Consider requiring proximity between the task words and status keywords, or using the defined but unused COMPLETION_PATTERNS, PROGRESS_PATTERNS, and BLOCKER_PATTERNS regex patterns.
| # Patterns indicating task completion | ||
| COMPLETION_PATTERNS = [ | ||
| r"(?:completed?|finished|done|implemented|fixed|resolved)\s+(?:the\s+)?(.+?)(?:\.|$)", | ||
| r"✓\s+(.+?)(?:\.|$)", | ||
| r"\[x\]\s+(.+?)(?:\.|$)", | ||
| r"successfully\s+(?:completed?|implemented|fixed)\s+(.+?)(?:\.|$)", | ||
| ] | ||
|
|
||
| # Patterns indicating work in progress | ||
| PROGRESS_PATTERNS = [ | ||
| r"(?:working on|started|beginning|implementing)\s+(.+?)(?:\.|$)", | ||
| r"(?:in progress|ongoing):\s*(.+?)(?:\.|$)", | ||
| ] | ||
|
|
||
| # Patterns indicating blockers | ||
| BLOCKER_PATTERNS = [ | ||
| r"(?:blocked|stuck|cannot|failed|error)\s+(?:on\s+)?(.+?)(?:\.|$)", | ||
| r"(?:issue|problem|bug)\s+(?:with\s+)?(.+?)(?:\.|$)", | ||
| ] |
There was a problem hiding this comment.
The COMPLETION_PATTERNS, PROGRESS_PATTERNS, and BLOCKER_PATTERNS class variables are defined but never used. The analyze_completion method implements its own simpler keyword matching instead. Either remove these unused patterns or refactor the logic to use them.
| items = event.get("items", []) | ||
| if not items and content: | ||
| # Try to parse from content | ||
| import contextlib |
There was a problem hiding this comment.
The import of contextlib is done inside the method body rather than at the module level. This is unconventional and adds unnecessary overhead on each call. Move this import to the top of the file with other imports.
| eval "codex exec --json --skip-git-repo-check --sandbox \"$SANDBOX\" --output-last-message \"$OUTPUT_FILE\" $EXTRA_ARGS \"\$(cat \"\$PROMPT_FILE\")\"" > "$SESSION_JSONL" 2>&1 || CODEX_EXIT=$? | ||
| else | ||
| codex exec --skip-git-repo-check --sandbox "$SANDBOX" --output-last-message "$OUTPUT_FILE" "$(cat "$PROMPT_FILE")" || CODEX_EXIT=$? | ||
| codex exec --json --skip-git-repo-check --sandbox "$SANDBOX" --output-last-message "$OUTPUT_FILE" "$(cat "$PROMPT_FILE")" > "$SESSION_JSONL" 2>&1 || CODEX_EXIT=$? |
There was a problem hiding this comment.
The codex exec command redirects both stdout and stderr to the SESSION_JSONL file (using > "$SESSION_JSONL" 2>&1). This means any stderr output (warnings, errors, debug messages) will be mixed with the JSONL events, which could cause parsing failures. Consider separating stderr or using a more robust approach like tee to capture stdout while still allowing stderr to be visible in logs, or redirect stderr separately.
Root cause: The reusable workflow was calling scripts/analyze_codex_session.py but the scripts were only available in the Workflows repo, not in consumer repos that call the reusable workflow. Changes: - Expanded sparse checkout to include scripts/ and tools/ directories - Made Workflows repo checkout ref dynamic (github.job_workflow_sha) so testing feature branches works correctly - Updated PYTHONPATH to include .workflows-lib for imports - Fixed script paths to use .workflows-lib/ prefix - Added LLM dependency installation step from .workflows-lib/tools/requirements.txt - Added requirements.txt for LLM dependencies (langchain-openai) - Added error output display for debugging when LLM analysis fails
The github.job_workflow_sha doesn't work correctly for checkout@v4 when using sparse-checkout. Instead, extract the ref from github.workflow_ref which contains the full path including the ref (e.g., refs/heads/feature/langchain-analysis).
Temporarily disable sparse-checkout to do a full checkout and ensure the scripts/ and tools/ directories are available. Will re-enable sparse-checkout once the checkout issue is resolved.
Add debugging to understand what context variables are available and use fetch-depth: 0 to ensure the SHA is fetchable when using job_workflow_sha.
Add a new input 'workflows_ref' that callers can use to specify which ref of the Workflows repo to checkout for scripts. This is needed because github.job_workflow_sha is not available in reusable workflow context. Callers should set workflows_ref to match their @ref in the uses: line. Defaults to 'main'.
Automated Status Summary
Scope
GITHUB_STEP_SUMMARYoutput so iteration results are visible in the Actions UITasks
agent:codexlabelagents-keepalive-loop.ymlafter agent runbuildStatusBlock()inagents_pr_meta_update_body.jsto acceptagentTypeparameteragentTypeis set (CLI agent): hide workflow table, hide head SHA/required checksagent:*label):<!-- gate-summary: -->comment posting (use step summary instead)<!-- keepalive-round: N -->instruction comments (task appendix replaces this)<!-- keepalive-loop-summary -->to be the single source of truthagent:*label):<!-- gate-summary: -->commentagent_typeoutput to detect job so downstream workflows know the modeagents-pr-meta.ymlto conditionally skip gate summary for CLI agent PRsAcceptance criteria
Head SHA: ac4aa0e
Latest Runs: ✅ success — Gate
Required: gate: ✅ success